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ABSTRACT 

We develop a simple analytic model for the gravitational clustering of 
dark matter haloes to understand how their spatial distribution is biased rel- 
ative to that of the mass. The statistical distribution of dark haloes within 
the initial density field (assumed Gaussian) is determined by an extension 
of the Press-Schechter formalism. Modifications of this distribution caused 
by gravitationally induced motions are treated using a spherical collapse ap- 
proximation. We test this model against results from a variety of N-body 
simulations, and find that it gives an accurate description of a bias function, 
6(Af, i?, (5) = (5h(Af, i?, where (5h(Af, i?, (5) is the mean overdensity of 

haloes of mass M within spheres which have radius R and mass overdensity 
6; the results depend only very weakly on how haloes are identified in the 
simulations. This bias function is sufficient to calculate the cross-correlation 
between dark haloes and mass, and again we find excellent agreement be- 
tween simulation results and analytic predictions. Because haloes are spa- 
tially exclusive, the variance in the count of objects within spheres of fixed 
radius and overdensity is significantly smaller than the Poisson value. This 
seriously complicates any analytic calculation of the autocorrelation function 
of dark halos. Our simulation results show, however, that this autocorrela- 
tion function is proportional to that of the mass over a wide range in i?, even 
including scales where both functions are significantly greater than unity. 
Furthermore, the constant of proportionality is very close to that predicted 
on large scales by the analytic model. Since analytic formulae for the non- 
linear autocorrelation function of the mass are already known, this result 
permits an entirely analytic estimate of the autocorrelation function of dark 
haloes. We use our model to study how the distribution of galaxies may be 
biased with respect to that of the mass. In conjunction with other data these 
techniques should make it possible to measure the amplitude of cosmic mass 
fiuctuations and the density of the Universe. 

Key words: galaxies: clustering-galaxies: formation-cosmology: theory- 
dark matter 

1 INTRODUCTION 

A fundamental problem in cosmology is to understand how the spatial 
distribution of galaxies (and of galaxy clusters) is related to that of the 
underlying mass. In the standard scenario of structure formation, a dominant 
dissipationless component of dark matter is assumed to aggregate into dark 
matter clumps, the virialized parts of which are usually called dark haloes. 
Galaxies then form by the cooling and condensation of gas within these 
dark haloes (White and Rees 1978). Complex nongravitational processes 
which cannot be modelled reliably are likely to be critical in determining the 
properties of individual galaxies, yet they have little effect on the formation 
and clustering of dark haloes. As a result it is useful to approach the problem 
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of galaxy biasing by first understanding how dark haloes are distributed 
relative to the mass. 

Since dark haloes are highly nonlinear objects, their formation and evo- 
lution has traditionally been studied using N-body simulations (e.g. Frenk 
f99f ; Gelb & Bertschinger f994a,b and references therein). Such simulations 
are limited both in resolution and in dynamical range and can be difficult to 
interpret. Our understanding of their results could be substantially enhanced 
by simple physical models and the analytic approximations they provide. In 
particular, a simple and accurate analytic model could be used not only to 
carry out large parameter studies, but also to help reconstruct the mass dis- 
tribution from observations. The present paper attempts to provide such a 
model. 

The initial distribution of density fluctuations in the universe is usually 
assumed to be Gaussian, and so to be described completely by its power 
spectrum. This, in turn, is derived from a model for the origin of structure 
in the early universe (e.g. Kolb & Turner 1990). It is perhaps feasible 
to associate dark haloes with certain specific regions of the initial density 
field and to consider how these regions cluster as a result of the statistics of 
the initial conditions and of the motions induced by gravity. Kaiser (1984) 
used this idea to show how the strong clustering of Abell clusters could be 
explained using the statistics of high peaks in an initial Gaussian field. His 
formalism was developed extensively by Bardeen et al. (1986). These authors 
showed that if galaxies can be associated with high peaks of the initial density 
field then they should be more clustered than the mass, an effect usually 
called "galaxy biasing". Unfortunately it is not known how well galaxies 
correspond to high peaks of the initial field, and there is direct evidence 
that the correspondance of peaks with dark haloes is not particularly good 
(Frenk et al. 1988; Katz, Quinn & Gelb 1993). In particular, it is unclear 
how to deal with the problems that a single dark halo often contains several 
peaks, and that the present-day clustering of peaks differs substantially from 
that in the initial (Lagrangian) space as a result of gravitationally induced 
motions. Substantial progress in overcoming these difficulties has recently 
been made by Bond and Myers (1995a, b) in their "Peak-Patch Picture". In 
the present paper, however, we follow a less rigorous but simpler and more 
easily implemented route. 

Press & Schechter (1974, hereafter PS) developed a formalism which iden- 
tifies haloes at any given cosmic time with regions of the initial density field 
which just collapse at that time according to a spherical infall model. This 
theory can be extended so that it predicts not only the evolution of the 
mass function of dark haloes, but also the full statistical properties of the 
heirarchical clustering process (Bond et al. I99I; Bower I99I; Lacey & Cole 
1993; Kauffmann & White 1993). Comparisons with N-body simulation data 
show detailed agreement for a very wide range of statistical properties of the 
clustering process (e.g. Bond et al. I99I; Bower I99I; Kauffmann & White 
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1993; and particularly, Lacey & Cole 1994). This agreement is quite surpris- 
ing since the basic hypothesis underlying the PS approach is found to work 
poorly on an object by object basis (Bond et al. 1991; White 1995). 

The PS theory developed in the above papers does not provide a model 
for the spatial clustering of dark haloes, but it is easily extended to construct 
such a model. We use the standard PS formalism both to define dark haloes 
from the initial density field, and to specify how their mean abundance within 
a large spherical region is modulated by the linear mass overdensity in that 
region. We then treat the gravitationally induced evolution of clustering by 
assuming that each region evolves as if spherically symmetric. Section 2 lays 
out this model in detail and shows how it can be used to calculate statistical 
properties of the clustering of dark haloes as a function of their mass and of 
the epoch at which they are identified. Section 3 then presents detailed tests 
of these predictions against a variety of large N-body simulations. Finally, in 
section 4 we discuss how our model might be used to understand biasing of the 
galaxy distribution with respect to that of the mass, and how these methods 
may help in reconstructing the cosmic mass distribution from observations. 

2 THE MODEL 

Although the model described here may readily be extended to other 
cosmologies, the present development assumes, for simplicity, an Einstein-de 
Sitter universe (i.e. that the total mass density parameter = 1, and the 
cosmological constant A = 0). 

2.1 Initial density field and dark matter haloes 

We assume that the initial overdensity field S[x.) = [/o(x) — p]/p (whose 
Fourier transform is denoted by 6^) is Gaussian and is described by a power 
spectrum P{k). The field S[x.) can be smoothed by convolving it with a 
spherical symmetric window function W{r; R) having comoving characteristic 
radius R (measured in current units). The smoothed field is 

6{^-R) = I W{\^-y\-R)6{y)d^j 

= I W{k;R)6i,exp{ik-^)d^k, (1) 

where W{k; R) is the Fourier transform of the window function W{r; R). A 
useful quantity characterising the power spectrum is the rms fiuctuation of 
mass in a given smoothing window: 

A'{R) = {[6{^; R)]') = J P{k)W'{k; R)d^k. (2) 
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For a given window function the smoothed field (5(x;i?) is Gaussian and so 
has the following one-point distribution function 



p{8-R)d8 



1 



1 d8 



(3) 



(27r)i/2 



exp — 



2A2(i?)J A(i?)' 



Since both 8 and A(i?) grow with time in the same manner in linear per- 
turbation theory, it is convenient to use their values linearly extrapolated 
to the present time. It is clear that these extrapolated quantities still obey 
equation (3). In what follows, we write our formulae in terms of the extrap- 
olated quantities, unless otherwise stated. Also we will omit writing explic- 
itly the smoothing radius i?, but we will often use subscripts to distinguish 
A, and other quantities, at different smoothing lengths [e.g. Aq = A(i?o), 
Ai = A(i?i)]. For a top-hat window function, which we adopt throughout 
this paper, the average mass contained in a window of radius R is simply 
M[R) = (47r/3)/ji?"^, where p is the mean density of the universe. For a given 
power spectrum P{k), the quantities i?, A and M are equivalent variables. 

We will assume that dark halos are spherically symmetric, virialized 
clumps of dark matter. In an Finstein-de Sitter universe a spherical per- 
turbation of linear overdensity 8 collapses at redshift Zc = 8c j 8 — I, where 
8c = 1.686. It is usually assumed that a collapsing structure virializes at half 
its radius of maximum expansion, implying a density contrast at the time 
of collapse of about 178. The mass Mi of a halo is related to the initial 
comoving radius i?i of the region from which it formed by 



Note that we will always label the properties of dark haloes (i?, Af, A etc.) 
using the subscripts I, 2, ... We will reserve the subscript for the properties 
of uncoUapsed spherical regions. 

According to PS theory, the probability that a random mass element is 
part of a dark halo of mass exceeding Mi at some given redshift Zi is just twice 
the probability that a surrounding sphere of mass Mi in the initial conditions 
has linearly extrapolated overdensity greater than 8c at that redshift. This 
probability is 



Ml = fpRl 



(4) 



where p[8; Ri) is given by equation (3). This equation can be rewritten as 




(5) 
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where Pi = (5i/Ai, we define 6i = [1 -\- Zi)6c^ and erfc[x) is tlie complemen- 
tary error function. Tlie differential mass distribution is then 



f{Mi,zi)dMi 



dF 



dMi 



Si 



(27r)i/2 Al 



exp 



SI 



2AI 



dAi 
dWi 



dMi. (7) 



Hence the comoving number density of haloes, expressed in current units, as 
a function of Mi and Zi is 



n{Mi,zi)dMi 



1/2 



p 6i JlnAi 
Ml Jin Ml 



exp 



A. 

2Al 



dMi 
~M^ 



(8) 



where p is the current mean density of the universe. Notice that in this 
theory a class of haloes must be defined by specifying both their mass Mi (or 
equivalently Ri or Ai) and their redshift of identification Zi (or equivalently 

We now need formulae which relate halo abundances to the density field 
on larger scales. Bond et al. (1991) derive the probability that the overden- 
sity at a randomly chosen point is Sq when the initial density field is smoothed 
on scale Rq and does not exceed 6i for any larger smoothing scale: 



q{6o, 6i; Ro)d6o 



1 



(27r)i/2 



exp 



a: 



exp 



('^o - 2Si f 
2AI 



dSo 
A~o- 



(9) 



for Sq < 6i and q = otherwise. In line with their reinterpretation of PS the- 
ory, they consider this to be the probability that a spherical region of initial 
radius Rq has linear overdensity Sq and is not contained in a collapsed object 
of mass exceeding Mq at redshift Zi given hy 6i = [1 -\- Zi)6c. Notice that 
our subscript convention means that Sq refers to an uncoUapsed region and 
so should be interpreted as the linear overdensity of that region extrapolated 
to the present, whereas (5i, 62^ etc. apply to collapsed halos and so should 
interpreted as 6c times 1 plus the redshift at which each halo is identified. 

Bond et al. (1991) also extend their PS argument to show that the 
fraction of the mass in a region of initial radius Rq and linear overdensity 
Sq which at redshift Zi is contained in dark haloes of mass Mi (where by 
definition Mi < Mq) is given by 



f{Ai,6i\Ao,6o)^mi 



1 



61 - 60 



(27r)i/2 (A2 - Aiyr- 



exp 



(<^i - <^o)' 



2{Al 



dAj 



dMi 
(.10) 

Thus the average number of Mi haloes identified at redshift Zi in a spherical 
region with comoving radius Rq and overdensity Sq is 



dMi. 



Af{l\0)dMi 



Ml ^ ' UMi 
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where /(1|0) = /(Ai, (5i|Ao, i^o)- Notice that since Mi is identified as a 
collapsed halo at 2;i > whereas Mq is assumed uncoUapsed at z=0, we have 
6i > Sq. Equations (9) to (11) turn out to be sufficient for us to derive some 
interesting results for the pattern of halo clustering "imprinted" on the initial 
conditions as a result of the statistical properties of gaussian fields. 

2.2 Clustering of haloes in Lagrangian space 

From the preceding analysis it is clear that the number of halos of mass 
Ml, identified at redshift Zi^ which form from the matter initially contained 
within spheres of radius Rq and linear overdensity Sq h gnificant depen- 

dence on Sq. It is useful to quantify this by calculating the average overabun- 
dance of halos in such spheres relative to the global mean halo abundance. 
This is simply 

*l|0)^-fi^-l, (12, 

where Vq = AttRq/S^ and the other quantities are taken from equations (8) 
and (11). This expression becomes particularly simple when the mass con- 
tained in the larger region is much greater than that of the haloes considered. 
When Rq ^ i?i (so that Aq ^ Ai) and \6o\ <^ we have 

<^h(l|0) = ^^o, (13) 

where again Pi = 6i/Ai. Thus the halo overdensity in these Lagrangian 
spheres is directly proportional to the linear mass overdensity. The constant 
of proportionality is the same as the one obtained from a related argument 
(sometimes called the peak-background split) by Efstathiou et al. (1988) and 
Cole & Kaiser (1989). It is useful to define a characteristic mass M^[zi) for 
the nonlinear clustering at redshift Zi through the requirement 

A{M,) = 6i = 6,{l + zi). (14) 

Equation (13) then shows that haloes with mass Mi exceeding Af* are initially 
biased towards regions of positive linear overdensity, while those with Mi < 
M^ are initially biased to regions of negative linear overdensity. Notice also 
that while the positive bias factor can be very large for Mi ^ Af* the antibias 
cannot be more nee ative than S^^/So = -1/Si> -0.59. 

By combining equations (9) and (12) we can immediately calculate a 
measure of the cross-correlation between the number of dark halos and the 
amount of mass in Lagrangian spheres of radius Rq. We define such a measure 
by 

CtJRo,Mi,zi) = {Stil\0)So)R, = — — / So AfmqiSo,Si;Ro)dSo, 

(15) 
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where (••)_Ro denotes an average over all Lagrangian spheres of radius Rq 
that are not contained in a collapsed object at redshift Zi. We use the 
notation because this definition gives an average of the standard cross- 
correlation between halos "h" and mass "m" in Lagrangian space "L". This 
average is carried out over a sphere of radius Rq according to ^{Rq) = 

/ / ~ yDd^xd^y . We refer to and similar quantities as "av- 
erage" (cross)-correlation functions. Notice that equation (15) requires only 
Ml < Mq. It nowhere assumes that Sq or S^^ should be small. We will show 
below that it provides a good description of our simulation data well into 
the nonlinear regime ^^^^^ > 1. On large scales, however, it simplifies using 
equation (13) to give 



Notice that this quantity measures clustering in the pattern of dark halo 
formation sites at early times when the mass is almost uniform. We now 
turn to measures of clustering in the current universe where the positions of 
dark halos have been modified as a result of gravitationally induced motions. 

2.3 Dynamical evolution of clustering 

To model the clustering of dark halos at recent epochs we have to be 
able to calculate their expected abundance in spheres which at the desired 
redshift z have radius R and (possibly) nonlinear overdensity 6. We relate 
these quantities to the initial Lagrangian radius Rq and the extrapolated 
linear overdensity Sq of the last section by using a spherical collapse model. 
In such a model each spherical shell moves as a unit and different shells do not 
cross until very shortly before they collapse through zero radius. Thus the 
mass interior to each shell is constant, giving (1 -|- 6)R^ = R^. Furthermore, 
since dark halos in our PS model are defined to be objects identified at 
some specific redshift, the mean abundance of equation (11) can be taken as 
referring to halos of mass Mi identified at redshift Zi within spheres of radius 
R{Ro^ Sq^ Zi) and overdensity 6[6o^Zi). 

For a spherical perturbation in an Finstein-de Sitter universe, the physical 
radius i? of a mass shell which had initial Lagrangian radius Rq and mean 
linear overdensity Sq is given for (5o > by 




R{Ro,So,z) 3 1 — cos^ 

r'o " To \So\ 



(16) 



1 _ 3 X 62/3(^-sin^)2/3 




1 + z 20 
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For Sq < 0, we just replace (1 — cos^) in equation (16) by (cosh^ — 1) and 
[6 — sinO) in equation (17) by (sinh^ — 6). Without loss of generality, let us 
assume 2; = at the time when the clustering of haloes is examined. Then Sq 
depends only on the present mass overdensity 6 = [Rq/ Ry — 1. The relation 
between these two quantities can be approximated accurately by 

So = -1.35(1 + (5)-2/3 + 0.78785(1 + sy^-^^"^"^^ - 1.12431(1 + S)-^^"^ + 1.68647. 

(18) 

This interpolation formula has the correct asymptotic behaviour near S = —1 
and as (5 — )• 00, as well as in the vicinity of 6 = 0. 

Under the above assumption the average overdensity of dark haloes in 
spheres with current radius R and current mass overdensity 6 can be obtained 
immediately from equations (8) and (11): 

where V = 47ri?"^/3, Rq = R[l + Sy^^^ and Sq is determined from 6 using 
equation (18). When Rq ^ i?i and \6o\ <^ we have 

6^{1\0) = b{M,,z,)6 = (1 + ^^^)S. (20) 

Again we find that halo overdensity is directly proportional to mass over- 
density but the constant of proportionality (usually known as the linear bias 
parameter) is now always positive. The first term in the parenthesis defining 
b[Mi^Zi) comes from the contraction (or expansion) of the spherical region, 
while the second refiects the bias in the initial density field as given by equa- 
tion (13). It is interesting to note that for equation (20) to be valid, it is not 
necessary to have 6 <^ 1. Indeed, for haloes identified at redshifts of one or 
greater we show below that equation (20) can hold for 6 substantially greater 
than unity. We also note that for Mi = M^[zi) we have Vi = 1 and so 6 = 1; 
thus Af* haloes are predicted to be unbiased relative to the mass. 

In analogy with the analysis of the last section we can now define an 
average cross-correlation between dark haloes and mass, this time for spheres 
of fixed radius i? at 2; = 0, 

eh4^,Mi,zi) = (<^h(l|0).5)« = — — / 8M{mp{8-R)d8, (21) 

n(Mi, Zi)v J -00 

where (••)_r denotes an average over all spheres with radius i? at 2; = 0, 
p{8] R) is the probability distribution function (PDF) of the mass overdensity 
in such spheres, and Rq and 80 in A/'(1|0) are related to R and 8 by the 
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spherical collapse model. There have been many attempts to model the 
nonlinear PDF (often called the counts-in-cells distribution) for a given initial 
spectrum of gaussian density fluctuations (e.g. Bernardeau 1994, Colombi 
1994). Unfortunately, the models proposed so far work reasonably well only 
in the linear and quasilinear regimes 6^1. In testing our model, we will 
often use a PDF derived directly from our N-body simulations. However, 
for illustration, we will also show results obtained using a simple lognormal 
approximation to the PDF (Coles & Jones 1991): 



p{S; R)d8 = ^ exp 



(ln/> + af/2) 
2af 



dS 

-, (22) 



where = (1 + (5), = ln[l + (T^(i?)], and cr(i?) is the rms overdensity 
fluctuation in a sphere of radius R. The latter can be obtained from the 
initial power spectrum through the formula given by Jain, Mo & White 
(1995). It turns out that such a PDF works remarkably well on scales where 
the average mass correlation function (^m(-R) ~ 1- 

2.4. The autocorrelation functions of dark haloes 

In analogy with the procedures of the last section we can define an average 
autocorrelation function at 2; = for haloes of mass Mi identified at redshift 
zi by 

Chh(i?,Mi,zi) = ([<5h(i|0)] )r = — — — —2 1, (23) 

[n{Mi,zi)V\ 

where, as before, Rq and Sq in A/'(1|0) are related to R and 6 by the spherical 
collapse model. In the limit where i?i <^ Rq and \6o\ <^ 61 this gives 

ehh(i?,Mi,zi) = [b{Mi,zi)fU{R), (24) 

where (^m(-R) is the average mass correlation function. If 61 is large (i.e. 
Zi > 1), equation (24) follows from eq.(23), even when (^m(-R) ~ 1- 

It is important to note that the average correlation function i^hh defined by 
equation (23) is not the same as the conventional one based on the variance of 
counts in randomly placed spheres (e.g. Peebles 1980, §36). This is because 
Chh{R) does not include the scatter of halo counts among spheres which have 
the same mean mass overdensity as well as the same radius. Let us denote 
the conventional average autocorrelation function by cr^j^. Then 

J J [n{Mi,zi)V\ n(Mi,zi)V 

(25) 
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where fi[R^6) is the mean square scatter of halo counts in spherical regions 
with radius R and overdensity 6. The discreteness term in equation (25) (the 
last on its rhs) cancels the second term if the scatter among the counts in 
spheres of the same R and 6 (and so the same Rq and Sq) is Poisson. However, 
since haloes are spatially exclusive this is not the case, and we expect that 
(/i(i?, <^))_R will be significantly less than n(Afi, Zi)V for small R. Our ^hh{R) 
is then not equivalent to c^h(-R)- 

We will see below that halo exclusion effects are indeed important when 
count variances are estimated for massive haloes in N-body simulations. They 
modify the values of (T^j^(i?) out to radii that are much larger than the typ- 
ical sizes of haloes, giving a result which is systematically shallower than 
either ^hh{R) or ^^(R). Thus, without a proper understanding of the scatter 
fi{R^6)^ the usefulness of our model for predicting cr^j^ is limited. This also 
means that cr^j^ is related to the mass correlation function in a more com- 
plicated way than is suggested by equations (23) and (24). In principle, the 
scatter could be modelled by examining the formation histories of individ- 
ual spherical regions of radius R and overdensity (5, as one does in studying 
the merging histories of individual dark haloes (e.g. Kauffmann and White 
1993). This would, however, lead to a much more complicated model. In 
this paper, we will adopt a different approach. 

Halo-halo exclusion must lead to a deficit of pairs at separations compa- 
rable to the sizes of haloes, but the pair count at larger separations could 
plausibly be almost unaffected. Thus the standard two-point correlation of 
haloes (as opposed to the average correlation functions we have been consid- 
ering so far) may be insensitive to exclusion effects for large separations. To 
see this more clearly, recall that the two-point correlation function of haloes 
may be estimated as 

AirnR'^AR 

where AP[R) is the average number of neighbours of a randomly chosen halo 
in the separation interval R it AR/2 and n is the mean number density of 
haloes. Now imagine splitting every halo into two halves each of which is 
counted separately. It is clear from equation (26) that ^hh{R) is unchanged 
for all R larger than the splitting radius. The situation is quite different for 
(T^j^(i?). From the definition of this quantity we can write (see Peebles 1980, 
§36) 

where {N'^[R)) is the mean square count of haloes in spheres of radius R and 
again V = AttR^ /3. If we try splitting each halo in this case, the first two 
terms on the rhs of equation (27) are unchanged, while the last is reduced by a 
factor of two. Thus when nV is small or the correlation signal is weak, cr^i^{R) 
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can be affected significantfy by smaff scafe cfustering. Based on this, it seems 
preferabie to use ^hh{R) rather than (T^j^(i?) to measure the autocorreiation 
of haioes on iarge scales; the effects of halo exclusion should then be reduced. 

Unfortunately, as we have seen, our model based on the extended PS 
formalism does not lead directly to a model for i^hh, or even for a^^{R). 
However, on large scales, we expect that the linear bias relation of equation 
(24) will apply also to (^hh(-R)- We can then write 

ehh(i?) = [K^l,^l)]'em(i?), (28) 

where (^m(-R) is the standard autocorrelation function for the mass. When 
haloes with a range of masses are considered, 6(Afi, Zi) should be replaced in 
this equation by the value obtained by averaging it over Mi with a weight- 
ing of n[Mi^Zi). As we have discussed before, the linear bias relations of 
equations (20) and (24) are valid if Rq ^ i?i and \6o\ <^ 6i. Hence, equation 
(28) may hold even for i^m > 1, especially when Zi ^ 1. In subsection 3.4, we 
will show that the autocorrelation of haloes in our simulations is described 
quite accurately by equation (28), even in the nonlinear regime of This 
is an important result, because it means not only that ^hh{R) is very simply 
related to i^m, but also that it can be estimated purely analytically using the 
analytic formulae for given by Jain et al. (1995). 

3 COMPARISON WITH NUMERICAL SIMULATIONS 

We now test our analytic theory by detailed comparison with the re- 
sults from a series of large cosmological N-body simulations. These simula- 
tions were performed using the particle-particle/particle-mesh (P^M) code 
described by Efstathiou et al. (1988) and are very similar to the simulations 
of that paper. However, they are substantially larger [N = 10^) and have 
higher resolution (gravitational softening length equal to iv/2500, where L 
is the side of the fundamental cube of the periodic simulation region). The 
initial conditions for each simulation imposed growing mode fluctuations cor- 
responding to a Gaussian random field with power-law fiuctuation spectrum, 
P{k) oc A;", onto a uniform "glass-like" initial particle load (see White 1995). 
All models assumed an Einstein-de Sitter universe and so their time evolution 
is expected to be self-similar. The initial power spectrum was normalized as 
described by Efstathiou et al. (1988) and "time" is measured by expansion 
factor a since the start of the simulation (a = 1 for the initial conditions). 
Each model evolved further than those of Efstathiou et al. (until the largest 
virialized clusters contained more than 10"* partices) and a repetition of the 
tests of that paper showed that similarity scalings are accurately obeyed 
throughout the evolution. Some examples of such tests can be found in Jain 
et al. (1995) where correlation functions and power spectra for these same 
simulations were analysed. 
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For the purposes of this paper we need to define catalogues of dark haloes 
in the simulations. For most of our discussion we will use catalogues con- 
structed by using the standard 'friends-of- friends' (FOF) group finder with 
a linkage length equal to 20% of the mean interparticle distance (e.g. Davis 
et al. 1985). This algorithm is easy to implement and has been extensively 
tested against the PS mass function (see Lacey & Cole 1994 for a careful 
discussion). For comparison, however, we will also present some results ob- 
tained using the spherical-overdensity (SO) grouping algorithm invented by 
Lacey & Cole and kindly made available by them. This algorithm is based 
on finding spherical regions with a certain predefined mean overdensity n. A 
local density near each particle is needed to provide an initial list of possible 
halo centres, and is defined as 3(A^ -|- l)/(47rr}^), where rjv is the distance to 
the N'th nearest neighbour. Further details may be found in Lacey & Cole's 
paper. We follow them in choosing n = 180 and N = 10. In this algorithm 
the mass of a halo is simply the number of particles within the bounding 
sphere. 

Many of the statistics we discuss in this paper are based on counts of 
haloes or of individual particles within randomly placed spheres. When eval- 
uating such statistics for the simulations we use counts of objects within 
spheres centred on each grid point of a regular 30"^ cubic mesh. Since the 
simulations are periodic, there are no difficulties with spheres overlapping 
the boundary of the simulated region. 

3.1 Cross-correlation between haloes and mass in Lagrangian space 

Our estimate of the average Lagrangian cross-correlation given in 

equation (15), requires no assumptions beyond those of the extended PS 
formalism. It thus makes a good point to start testing our model. In order 
to calculate in the simulations, we select dark halos at some time a > 1. 
The Lagrangian position of a halo is then taken to be the centre of mass of 
the initial unperturbed positions of its constituent particles. Counts of these 
halo positions can be computed within spheres of given Lagrangian radius 
i?o- Only spheres with extrapolated linear overdensity less than 8c at the time 
of halo identification are used when calculating the relevant averages since, 
by hypothesis, spheres with larger Sq are part of collapsed dark haloes, and 
are excluded from our model by the definition of q in equation (9). Figure 1 
shows the ratio (^j^(i?o)/i^(-Ro) as a function of log{Ro/L), where ^ oc 
is the average autocorrelation of the extrapolated linear overdensity and is 
computed directly from the linear power spectrum using equation (2). 

The different symbols in figure 1 refer to halos of different mass as de- 
tailed in the caption. The corresponding model predictions are shown as the 
solid curves and are obtained from equation (15) by integrating n[Mi^Zi) 
and J\f over the appropriate range of halo mass Mi . The agreement between 
the model and the simulation results is good for all except the most massive 
haloes. The problem with the latter almost certainly arises from poor statis- 
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tics since the simulations contain relatively few such haloes. For each power 
index n and expansion factor a, more massive haloes are more strongly cor- 
related with linear overdensity. Haloes with M < Af* are anticorrelated with 
linear overdensity as expected from equation (13). The agreement found here 
is not surprising since Lacey & Cole (1994) already showed many aspects of 
the extended PS formahsm to be in excellent agreement with simulations 
similar to our own. 

3.2 The bias relation 

The second major ingredient of our theoretical formalism is the spher- 
ical model which we use to relate the Eulerian quantities R and 6 to the 
Lagrangian quantities Rq and Sq. To test this model let us define a bias 
function b through the relation 

6h{m = b{R,6,M^,z^)6. (29) 

Equation (19) shows that this function depends only on the extended PS 
formalism and on the spherical collapse model. 

In Eigure 2, we show log[l-\-6fi) as a function of log[l-\-6) for several values 
of R/ L The vertical error bars give the rms scatter of the 6^ values in the 
corresponding 6 bins. In our simulations the Lagrangian radius of a halo of 
mass M is about 0.02(Af/32)^/"^iv. The smallest radius for which results are 
shown in Eigure 2 corresponds to the Lagrangian radius of the least massive 
haloes considered [M = 32). This radius is also smaller than the scales where 
the mass correlation function becomes nonlinear. It is remarkable that the 
analytic model reproduces the simulation results over a wide range of R and 
over almost the entire range of 6 that is probed by the present simulations. 
In the analytic model, the value of 6^ drops abruptly to —1 when 6 becomes 
so small that the total mass contained in a sphere is smaller than the mass of 
the smallest haloes. In the simulations the drop is less abrupt because some 
of the member particles of a halo can be outside the sphere that contains 
its centre. Eor the largest values of 6 statistics are poor in the simulations 
because very few of our 30"^ spheres have such values. The scatter in 6^ 
among spheres with the same 6 is generally smaller than the mean value of 
Sfi. Exceptions occur for small spheres and for "almost empty" spheres. Thus 
it is a reasonable approximation to treat 6^ as a function of 6 as in equation 
(29). The increase of 6^ with 6 is faster for haloes with higher M/M^^ showing 
again that haloes with high M/M^ are biased toward regions with large mass 
overdensities. 

Eigure 3 shows the same thing as Eig.2, except that haloes are selected 
at an earlier epoch (when the expansion factor is a = ai) than the one when 
the 6fi-6 relation is examined (at a = a2 > Oi). In general, haloes identified 
at ai will, by a2, have increased their mass by accretion or lost their identity 
by merging. However, galaxies which were forming at their centres at ai may 
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still remain distinct at 02- Thus the relation examinined in Figure 3 may be 
relevant to the biasing of galaxies. In the simulations the position of each 
halo (or "galaxy") at the later epoch is assumed to be that of the particle 
which was closest to its centre at ai. The horizontal error bars in Figure 3 
represent the scatter in 6 when the count data are rebinned according to the 
value of Sfi. These bars are also relatively narrow, showing that it may be 
possible to use "galaxy" overdensities (i.e. S^) to predict mass overdensities 
(i.e. 6). The model predictions in Figure 3 are obtained from equation 
(19) with Si in n (eq.9) and in JV (eq.l2) taking the value Si = (a2/«i)'^c- 
Clearly the model also works gratifyingly well in this double epoch context. 
We conclude that our formalism provides a useful description of the bias 
function 6(i?, S^ Afi, zi). 

3.3 Cross-correlation between haloes and mass in Eulerian space 

The simplest of the Fulerian correlation statistics for dark haloes is the 
average cross-correlation between haloes and mass. This quantity depends 
only on the mean number of dark haloes in spheres of given radius R and mass 
overdensity S^ and it is independent of the scatter in this number. Because of 
this simplification we begin our comparison of simulation and analytic results 
for correlations with this statistic. Unfortunately to calculate any Fulerian 
correlation statistic for dark haloes, it is necessary to make some assumption 
about the probability distribution of the mass overdensity S. As noted above, 
this is a significant problem because there are currently no theoretical models 
for this PDF which remain accurate in the nonlinear regime, A > 1. In the 
following we either take the PDF directly from the simulation itself or use 
the simple lognormal approximation of equation (22). 

Figure 4 shows the ratio of the average cross-correlation between haloes 
and mass to the average autocorrelation of the mass. The heavy ticks on 
the horizontal axis show the values of R where i^m = 1- Open symbols give 
results for haloes in various mass ranges as identified by the FOF group 
finder. Filled dots in Figure 4a give corresponding results for the SO group 
finder. We see clearly that the two group finders give similar results for the 
average cross-correlation function i^hm- For the case n = —1.5, the SO haloes 
have slightly higher i^hm than the corresponding FOF haloes because the SO 
group finder breaks clusters into subgroups more efficiently than the standard 
FOF group finder; this property starts to become significant in a model with 
substantial large-scale power. 

Moving to our analytic predictions, the solid curves in Fig. 4 show re- 
sults from equation (21) using a PDF derived directly from the simulations. 
These curves match the simulation results in most cases, but not for low 
M I when i^m > 1- Haloes with such low M/M^ are biased toward un- 
derdense regions in the Lagrangian space (Fig.l) and our spherical collapse 
model may be an inadequate description of the nonlinear evolution of under- 
dense regions. The dashed curves in Figure 4a show predictions based on the 
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lognormal PDF of equation (22). When ~ 1 these predictions agree with 
those based on the empirical PDF, but in the nonlinear regime the lognor- 
mal predictions are, in general, significantly worse. A better analytic model 
for the PDF would clearly improve our ability to make predictions in the 
nonlinear regime. Notice, however, that both predictions remain accurate 
to values of exceeding unity. This often corresponds to i^hm values much 
above unity. 

In Figure 4b we show " two epoch" cross-correlations between haloes and 
mass. As in Figure 3 the haloes are identified at ai but the correlations are 
calculated at the later epoch a2 > Oi. In this case the agreement between 
the analytic model (using the empirical PDF) and the simulation results 
is excellent even on scales where the mass autocorrelation is substantially 
nonlinear. This reinforces the idea that careful use of our analytic techniques 
should allow an accurate prediction of galaxy correlations for any specific 
model of the relation between galaxies and the haloes in which they form. 

3.4 Autocorrelations of haloes in Eulerian space 

Fstimating average autocorrelations for dark haloes requires knowledge 
of the scatter in the number of haloes in spheres of given radius and mass 
overdensity in addition to knowledge of the mean number. As discussed in 
§2.4, we have not yet found an adequate way to calculate this within our 
formalism. We now demonstrate explicitly that improper treatment of halo 
exclusion effects can cause serious errors in the estimate of average autocorre- 
lations, but that a simpler linear bias model for the standard autocorrelation 
function actually works quite well. 

In Figure 5 we use individual symbols to show the ratio cr^i^{R^ Afi, Zi)/^^[R) 
as a function of R for haloes identified using the FOF group finder. Dashed 
curves in the left-hand panels show corresponding results for SO haloes. On 
small scales the SO values are systematically higher than the FOF values, 
particularly for massive haloes. As before, this is because the SO group finder 
breaks clusters into subgroups more efficiently than the FOF algorithm. Solid 
curves in this figure show the predictions of equation (23). The discrepancies 
are substantial on small scales, and indeed on all scales for n = 0. Count 
variances in the simulations increase less rapidly to small scales than either 
Cm or (^hh as given by equation (23). This discrepancy appears due to halo 
exclusion effects as discussed in §2.4 (see equation 25). This is consistent 
with the fact that the predictions are better for large i?, for small Af , and 
for haloes identified at an earlier epoch than the one for which the average 
autocorrelation is calculated (see the right panels of Fig. 5). For n = clus- 
tering on large scales is weak, and the effect of halo exclusion propagates to 
large scales in (T^j^(i?). 

In order to avoid these exclusion problems we now consider the stan- 
dard autocorrelation function of dark haloes {^hh{R) defined by equation 26) 
rather than the average autocorrelation function of Figure 5. In Figure 6 we 
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use squares to show this autocorrelation lunction lor haloes identified using 
the FOF algorithm. The error bars represent bootstrap errors, as discussed 
in Mo, Jing & Borner (1992). Dashed curves in Fig. 6a show the correspond- 
ing results lor SO haloes; lor reasons anticipated in the discussion ol §2.4, 
Chh{R) does not depend significantly on the group finder lor R larger than 
the linear sizes ol haloes. (The linear sizes are about 0.02iv and 0.03iv lor 
haloes with Af = 32 and 128, respectively.) The solid curves in Figure 6 
show the predictions ol equation (28), with (^m(-R) estimated directly Irom 
the simulations. In practice, can equally well be obtained Irom the initial 
density power spectrum by using the fitting formula ol Jain et al. (1995); 
this procedure allows a purely analytic estimation ol halo autocorrelations. 
It is clear Irom Fig. 6 that this simple model fits the simulation results quite 
well over a wide range ol scales. The thick ticks on the horizontal axes mark 
the values ol R where (^m(-R) = 1- Our model can work well even lor i^m > 1; 
this is particularly the case when autocorrelations are estimated lor haloes 
identified at some earlier epoch. On the other hand, the model can break 
down on scales smaller than the linear sizes ol haloes because ol the exclusion 
effects discussed above. 

4 DISCUSSION 

The tests ol the last section show that the spatial distribution ol dark 
haloes and its relation to the underlying mass distribution can be described 
quite accurately by our simple analytic model. As a result this model should 
provide some indication ol how the observed galaxy distribution may related 
to that ol the mass. In this section, we will briefiy discuss several possible 
applications ol these results. Details ol these applications will be described 
elsewhere. 

As a first application consider how the galaxy formation process may 
bias the galaxy distribution. The results presented in Figures 2 and 3 show 
clearly that the bias ol dark haloes depends not only on their mass but also 
on the epoch when they are identified. Indeed, lor haloes ol a given mass, the 
bias increases strongly with the redshilt ol identification. Thus the objects 
which formed at the centres ol early, relatively low-mass haloes can be more 
strongly clustered today than current haloes ol larger mass. This result is 
interesting. In a hierarchical clustering scenario, such as the CDM model 
or any ol its currently popular variants, high mass haloes form through the 
merger ol smaller systems. II these early haloes produced galaxies which 
survive to the present day, these galaxies could be just as strongly clustered 
as the more massive galaxies which form later. A strong mass (or luminosity) 
dependence ol galaxy correlations, in the sense that brighter galaxies are more 
strongly clustered, is not a necessary consequence ol the hierarchical model. 
Our model predicts that objects which form at redshilt z in haloes with 
mass M = M^[z) will be unbiased relative to the mass at all later redshilts 
(equation 20). A strong "natural" bias in the galaxy distribution ol the kind 
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advocated by White et al. (1987) can only be achieved if most galaxies formed 
at the centres of haloes with M > Af*. At any given time massive dark haloes 
may, of course, contain more than one galaxy, and some low mass haloes 
may contain no galaxies at all. The observed galaxies do not, presumably, 
correspond uniquely to the centres of the haloes present at any single epoch. 
As a result, it is not straightforward to apply our results directly to galaxies. 
However, if more detailed modelling allows a prediction of the number of 
galaxies in a halo as a function of its mass and of galaxy properties (e.g. 
Kauffmann, White & Guiderdoni 1993; Kauffmann, Guiderdoni & White 
1994; Cole et al. 1994) our results can readily be extended to study galaxy 
clustering as a function of luminosity, morphological type, colour, or any 
other property of interest. 

A second application is to the study of how the bias function 6(i?, (5, Afi, Zi) 
depends on 6 and R. In particular, our model should enable us to see the 
extent to which the assumption of linear bias (i.e. b = const.) is valid. In 
Figure 7 we show model predictions for the Si^-S relation for spheres with 
R = 0.05iv. We see clearly that the relation can deviate from the simple lin- 
ear one, (5h oc 6. The deviation is larger for haloes identified at later epochs, 
for more massive haloes, and for smaller values of R. To see this depen- 
dence more clearly, let us expand (5h to second order in 6 and to first order in 
T] = (Ao/Ai)^. From equation (18), we have Sq = 6 -\- cP with c = —0.805. 
Fxpanding equation (19) to the necessary order gives 

h^ = B^\Brh\\^B^h\ (30) 

where Bq = (?y/2)(3 — p^) is a zero-point offset, i?i = 1 -|- (z/^ — is the 

hnear bias of equation (20), and B2 = (z/i/(5i)^(z/^ ~ 3) -|- 2{v^ — 1)(1 -|- c)/6i. 
The second order coefficient B2 can be either positive or negative depending 
on the value of p^. In particular, for haloes of mass Mi < M^[zi) we have 
Pi < 1 and (5h always falls below the linear bias relation for large 6'^. This 
negative second derivative is quite evident in the curves of Figure 7, as is the 
zero-point offset which is largest for massive haloes. 

Relations such as equation (30) may also have important implications for 
the skewness and other high-order moments of the halo count distribution. 
If we could neglect the scatter in halo counts for spheres of given R and (5, 
then the formulae of Fry & Gaztanaga (1993) would allow the skewness of 
the halo counts to be written as Sh = {Sm + 3i?2/-Bi)/i?i, where Sm is the 
skewness of the PDF of 6. For Af* haloes identified at redshift Zi this gives 
Sh = Sm — 6/(5j, which is substantially smaller than Sm unless Zi is high. 
While additional contributions to the skewness of halo counts come from 
scatter terms analogous to the fi[R^6) term in equation (25), this reduction 
of Sh caused by the nonlinearity of the mean bias relation may explain why 
late-type galaxies have a lower value of skewness than early-type galaxies 
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(Jing, Mo, Borner 1991). This line of argument is obviously worth further 
investigation. 

The results of §3.4 show that to a good approximation the halo autocorre- 
lation function is directly proportional to the mass autocorrelation function 
with a constant bias which can be calculated simply using our model [equa- 
tions (20) and (28)]. This validates the usual linear bias assumption well into 
the nonlinear regime, and shows directly how the bias is related to the "peak 
height" of haloes through the parameter Pi. From equation (6) we see that 
this parameter also determines the fraction of the cosmic mass in haloes with 
masses exceeding Mi. If the abundance of a certain type of haloes is known 
as a function of their mass, then this fraction can be estimated, and so Pi 
calculated, for any assumed value of the cosmic density parameter Oq- This 
in turn allows the mass correlation function to be estimated directly from 
the observed halo correlation function. Thus the conventional normalization 
amplitude for mass fluctuations erg is obtained, again as a function of the 
assumed Oq- One can then try different values of Oq and require that the 
(78 obtained from this argument is consistent with other determinations [e.g. 
from large-scale flows (Dekel 1994), from cluster abundances (White, Frenk 
& Efstathiou 1993) and from weak gravitational lensing (Villumsen 1995)]. 
In this way one may hope to measure Oq- 
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Figure 1: The average cross-correlation function between haloes and mass in 
Lagrangian space, normalized by the linear average mass correlation function. 
Results from simulations are shown for haloes with mass Af > 32 (triangles), 
64 (three-pointed stars), 128 (squares), and 256 (crosses). The solid curves 
are the model predictions. The statistics for the M > 256 samples in the 
simulations are poor, because they contain only small number of haloes. The 
power index n and the expansion factor a are shown in each panel. 




Figure 2a: The bias relation, i.e. the overdensity of haloes (5h versus the 
overdensity of mass 6 in spherical windows with radius R. Circles, squares 
and triangles show the simulation results for R/L = 0.02, 0.05 and 0.13, 
respectively. To avoid crowding, the results for R/L = 0.05 and 0.13 are 
shifted by 1 and 2 decades along the horizontal axis. Vertical error bars 
show the Icr scatter of the (1 + S^) values in the corresponding 6 bins. Model 
predictions are shown by the solid curves. Results are shown for haloes with 
Af > 32 (left panels) and M > 128 (right panels). Here haloes are selected at 
the same epoch a as when the correlation function is calculated. The power 
index n and the expansion factor a are shown in each panel. 
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Figure 2b: The same as Figure 2a for a different set of expansion factors. 
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Figure 3: The same as Figure 2, but here haloes are selected at an earlier 
epoch (when the expansion factor a has the lower value indicated in each 
panel) than when the 6fi-6 relation is examined (at the epoch with the higher 
a). The horizontal error bars represent the scatter of 6 among spheres within 
the corresponding 6^ bins. 




Figure 4a: The average cross-correlation function between haloes and mass, 
normalized by the autocorrelation function of the mass. Triangles, three- 
pointed stars, squares and crosses show the results of simulations for haloes, 
identified by the standard FOF group finder, with mass M > 32, 64, 128, 
and 256, respectively. The solid circles show the corresponding results for 
haloes identified by the SO group finder. The solid curves show the model 
predictions, with the PDF derived directly from the simulations. The dashed 
curves show the results obtained from the lognormal PDF. The thick ticks on 
the horizontal axis show the values of R where the average mass correlation 
function i^m = 1- Here haloes are selected at the same epoch a as when 
the correlation function is calculated. The power index n and the expansion 
factor a are shown in each panel. 
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Figure 4b: The same as 4a, but here haloes are selected at an earlier epoch 
(when the expansion factor a has the lower value indicated in each panel) 
than when the correlation function is calculated (at the epoch with the higher 
a). 




Figure 5: The variances of halo counts in the simulations. Symbols show 
the results for FOF haloes with M > 32 (triangles), 64 (three-pointed stars), 
128 (squares) and 256 (crosses). Dashed curves (in the left panels) show the 
results for the corresponding SO haloes. The solid curves show the predic- 
tions of equation (23). The results shown in the right panels are for haloes 
selected at an earlier epoch (when the expansion factor a has the lower value 
indicated in each panel) than when the correlation function is calculated (at 
the epoch with the higher a). 




Figure 6a: The standard two-point correlation functions of haloes in the 
simulations. Squares are for FOF haloes, while dashed curves for SO haloes. 
The error bars represent bootstrap errors. The solid curves show the pre- 
dictions of equation (28). The thick ticks on the horizontal axies show the 
values of R where i^m = 1- Results are shown for haloes with mass M > 32 
(left panels) and M > 128 (right panels). Here haloes are selected at the 
same epoch a as when the correlation function is calculated. 




Figure 6b: The same as 6a, but here haloes are selected at an earlier epoch 
(when the expansion factor a has the lower value indicated in each panel) 
than when the correlation function is calculated (at the epoch with the higher 
a). 




Figure 7: The model predictions of the bias relation for spheres with R = 
0.05iv, plotted in linear coordinates to show the deviation from the linear 
relation (5h oc 6. Results are shown for haloes with mass M > 32 and M > 
128. For a given mass, the steeper curve shows the results for haloes identified 
at a = 9.35 and analyzed at a = 20.2, while the shallower one shows the 
results for haloes both identified and analysed at a = 20.2. 



